Take Home Exercise 1

Analysing and Visualising Spatio-temporal Patterns of COVID-19 in DKI Jakarta, Indonesia.

Teo Jun Peng https://teojp3-is415.netlify.app/
09-09-2021

1. Introduction

This analysis aims to analyse and visualise spatio-temporal patterns of COVID-19 in DKI Jakarta, Indonesia. Out of 34 provinces in Indonesia, DKI Jakarta was the province affected most by the pandemic, with close to 24% of cumulative confirmed cases. However, the cumulative confirmed cases were not evenly distributed, therefore this analysis intends to unravel which sub-districts had the highest number of cases and how time has changed the overall distribution.

1.1 The Data

For this analysis, the following data are used:

P.S. March 2020 Dataset only started from 25 March 2020 as per the source, so the dataset might not be a true representative of March 2020. Similarly for January 2021 Dataset, 30 January 2021 is the most updated data, the link to access 31 January 2021 is broken so this may not be a true representation of January 2021.

2. Setting up the environment

R packages will be used for efficiency and a more comprehensive analysis, such as tidyverse and sf etc.

load("THE1_workspace.Rdata")
packages = c("tidyverse", "sf", "readxl", "readr", "stringr", "tmap", "lemon", "formatR")
for (p in packages) {
    if (!require(p, character.only = T)) {
        install.packages(p)
    }
    library(p, character.only = T)
}

3. Data Wrangling

3.1 Aspatial Data

3.1.1 Importing and creating a list

list.files function helps to create a list from the imported data files. The files are also imported all at once using a pattern to match the file names, ensuring full efficiency as compared to importing the files individually.

The R function lapply is also complementary for this process, as well as adding the file names as an additional column to the dataframes (So as to display the data by month later on).

file_list <- list.files(path = "data/the1data/COVID-DATA", pattern = "*.xlsx", full.names = T)
df_list <- lapply(seq_along(file_list), function(x) transform(read_xlsx(file_list[x]),
    MonthYear = file_list[x]))

3.1.2 Looking through the data and initial data cleaning

From here we then manually check through df_list to find which Meninggal column is the correct one for each file, for referencing the coalesce process later on in the next few steps (E.g. For February 2021, Meninggal…1 is the correct column to use since it is the Meninggal column with no NA values.)

The inspection tells us that Meninggal…23 to Meninggal…25 is not used so we can skip those columns later on in the coalesce process.

df_list
Visualisation of df_list, where we checked through the correct Meninggal columns to use

3.1.3 Combine df_list into a dataframe

We will then combine df_list into a real dataframe using Idlpy function from the plyr package.

library(plyr)
df <- ldply(df_list, data.frame)
Visualisation of df dataframe

3.1.4 Conversion of data types for certain columns

To combine/integrate values from the various Meninggal columns, we will have to convert Meninggal…26 column’s data type so we can use coalesce function later on (because it originally is a chr type and chr type cannot combine with double type).

July 2020 is using Meninggal…26 as the correct column, so we have to carry out the conversion.

df$Meninggal...26 = as.double(df$Meninggal...26)

df <- df %>%
    na_if("N/A")

3.1.5 Coalesce process for NA/missing values

In this step we aim to combine ID_KEL into one main column, since some of the files have different layouts. Coalesce is a function to take in values from another column (So if ID_KEL has NA values while ID_KEL…1 has values, we will take from ID_KEL…1 and add them into ID_KEL)

df <- df %>%
    mutate(ID_KEL = coalesce(ID_KEL, ID_KEL...1, ID_KEL...2))

Similarly, we want to combine menninggal together as one column, since some of the files have different layout. We know that only 28,29,30,31 is used after checking the dataset earlier on during “df_list”, so we will only integrate those columns together.

df <- df %>%
    mutate(Meninggal = coalesce(Meninggal, Meninggal...28, Meninggal...29, Meninggal...30,
        Meninggal...31, Meninggal...26))

3.1.6 Subsetting the necessary columns from dataframe

Next, we will get only the required columns out from the dataframe.

library(dplyr)

df2 <- df %>%
    select("MonthYear", "ID_KEL", "Nama_provinsi", "nama_kota", "nama_kecamatan",
        "nama_kelurahan", "POSITIF", "Meninggal")

So df2 is the modified dataframe, and currently this is the output. lemon_print is used for a more aesthetically pleasing dataframe.

head(df2)
Table 1: Modified dataframe df2 (Rendered with lemon_print)
MonthYear ID_KEL Nama_provinsi nama_kota nama_kecamatan nama_kelurahan POSITIF Meninggal
data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx NA NA NA NA TOTAL 339735 5478
data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx 3172051003 DKI JAKARTA JAKARTA UTARA PADEMANGAN ANCOL 834 9
data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx 3173041007 DKI JAKARTA JAKARTA BARAT TAMBORA ANGKE 617 8
data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx 3175041005 DKI JAKARTA JAKARTA TIMUR KRAMAT JATI BALE KAMBANG 755 15
data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx 3175031003 DKI JAKARTA JAKARTA TIMUR JATINEGARA BALI MESTER 358 8
data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx 3175101006 DKI JAKARTA JAKARTA TIMUR CIPAYUNG BAMBU APUS 870 13

3.1.7 Cleaning up MonthYear column so as to sort the values by month

As shown above, we need to clean up the MonthYear column as it is very messy. We will clean up the MonthYear column using str_replace function, to replace unnecessary characters in the values.

df2 <- df2 %>%
    mutate_at("MonthYear", str_replace, "data/the1data/COVID-DATA/Standar Kelurahan Data Corona",
        "")

df2 <- df2 %>%
    mutate_at("MonthYear", str_replace, "[(]", "")

df2 <- df2 %>%
    mutate_at("MonthYear", str_replace, "[)]", "")

df2 <- df2 %>%
    mutate_at("MonthYear", str_replace, ".xlsx", "")

We will then turn MonthYear column into date data type using Sys.setlocale and as.Date functions.

Sys.setlocale(locale = "ind")
[1] "LC_COLLATE=Indonesian_Indonesia.1252;LC_CTYPE=Indonesian_Indonesia.1252;LC_MONETARY=Indonesian_Indonesia.1252;LC_NUMERIC=C;LC_TIME=Indonesian_Indonesia.1252"
df2$MonthYear <- c(df2$MonthYear) %>%
    as.Date(df2$MonthYear, format = "%d %B %Y")

Next, deleting away wrong/unnecessary rows (NA values in ID_KEL, subdistrict name as rows in ID_KEL column etc).

df2[!is.na(df2$ID_KEL), ]

df2 <- subset(df2, grepl("^\\d+$", df2$ID_KEL))

Sorting the modified dataframe by MonthYear and resetting of index.

df2 <- df2[order(df2$MonthYear), ]
final_df <- df2
row.names(final_df) <- 1:nrow(final_df)
Visualisation of final_df, with the modified MonthYear column and index reset

3.1.8 Exporting .rds files out from the final dataframe

With all the data cleaning done, we will end out with writing out a .rds file from final_df and utilising it further on in the analysis, so as to ensure minimum data storage space.

aspatial_df <- write_rds(final_df, "data/the1data/rds/aspatial_df.rds")
aspatial_df <- read_rds("data/the1data/rds/aspatial_df.rds")

3.2 Geospatial Data

3.2.1 Importing shapefile data into a simple feature dataframe

The following code cunk imports the DKI Jakarta geospatial data into R as a simple feature dataframe.

geospatial_df <- st_read(dsn = "data/the1data/BATAS DESA DESEMBER 2019 DUKCAPIL DKI JAKARTA",
    layer = "BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA")
Reading layer `BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA' from data source `C:\teojp3\IS415_blog\data\the1data\BATAS DESA DESEMBER 2019 DUKCAPIL DKI JAKARTA' 
  using driver `ESRI Shapefile'
Simple feature collection with 269 features and 161 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 106.3831 ymin: -6.370815 xmax: 106.9728 ymax: -5.184322
Geodetic CRS:  WGS 84

3.2.2 Checking and ensuring Projected Coordinates Systems of the data is correct

st_crs(geospatial_df)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

From the above output, we see that the Projected Coordinates System is WGS84 which is wrong, therefore we’ll Change the Projected Coordinates Systems to DGN95 (which is the national Projected Coordinates Systems of Indonesia).

jakarta_DGN95 <- st_transform(geospatial_df, 23845)

3.2.3 Excluding the outer islands from the sf dataframe

Outer Islands are also known as Kepulauan Seribu, we will exclude them as they are detached from the mainland.

jakarta_DGN95 <- subset(jakarta_DGN95, KAB_KOTA != "KEPULAUAN SERIBU")

3.2.4 Subsetting necessary columns from the sf dataframe

For the analysis, we will only keep the first nine fields, whereby the last field is JUMLAH_PEN (Total Population).

jakarta_DGN95 <- jakarta_DGN95[, 0:9]
Visualisation of jakarta_DGN95 sf dataframe, with only the required 9 columns

3.2.5 Data cleaning and amended some inaccuracies between the datasets

Some inaccuracies in the data were discovered between identification keys from aspatial and geospatial after manually checking through their data. Therefore, necessary amendments were made to the data.

If data cleaning process is not done, there will be data missing from some states (Plotted maps will have missing values).

jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "BALEKAMBANG"] <- "BALE KAMBANG"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "HALIM PERDANA KUSUMA"] <- "HALIM PERDANA KUSUMAH"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "JATIPULO"] <- "JATI PULO"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "KALIBARU"] <- "KALI BARU"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "KRAMATJATI"] <- "KRAMAT JATI"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "PALMERIAM"] <- "PAL MERIAM"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "PINANGRANTI"] <- "PINANG RANTI"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "PAL MERAH"] <- "PALMERAH"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "TENGAH"] <- "KAMPUNG TENGAH"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "PULOGADUNG"] <- "PULO GADUNG"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "KALI DERES"] <- "KALIDERES"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "RAWAJATI"] <- "RAWA JATI"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "KRENDANG"] <- "KERENDANG"

Function to help us check if the identifier keys between the aspatial and geospatial data are correct.

a <- c(aspatial_df$nama_kelurahan)
b <- c(jakarta_DGN95$DESA_KELUR)

a[!(a %in% b)]

3.2.6 Removing NA value rows from dataframe

We will drop NA value rows, as they will only affect our analysis results.

jakarta_DGN95 <- jakarta_DGN95 %>%
    drop_na(OBJECT_ID)

3.3 Geospatial Data Integration

3.3.1 Data preparation for Cumulative Case Rate by month (per 10,000 Population)

Functions from various packages will be used: e.g. pivot_wider() of tidyr package and group_by of dplyr package.

The formula we are using is:

COVID-19 Cases Rate per 10,000 population (Monthly) = (Total Cumulative Cases / Total Population) x 10000
Case_Rate <- aspatial_df %>%
    inner_join(jakarta_DGN95, by = c(nama_kelurahan = "DESA_KELUR")) %>%
    group_by(nama_kelurahan, MonthYear) %>%
    dplyr::summarise(Covid_Cases_Per_10000_Pop = ((sum(POSITIF)/JUMLAH_PEN) * 10000)) %>%
    ungroup() %>%
    pivot_wider(names_from = MonthYear, values_from = Covid_Cases_Per_10000_Pop)

3.3.2 Data preparation for Cumulative Death Rate by month (per 10,000 Population)

Functions from various packages will be used: e.g. pivot_wider() of tidyr package and group_by of dplyr package.

The formula we are using is:

COVID-19 Death Rate per 10,000 population (Monthly) = (Total Cumulative Deaths / Total Population) x 10000
Death_Rate <- aspatial_df %>%
    inner_join(jakarta_DGN95, by = c(nama_kelurahan = "DESA_KELUR")) %>%
    group_by(nama_kelurahan, MonthYear) %>%
    dplyr::summarise(Death_Rate_Per_10000 = ((sum(Meninggal)/JUMLAH_PEN) * 10000)) %>%
    ungroup() %>%
    pivot_wider(names_from = MonthYear, values_from = Death_Rate_Per_10000)

3.3.3 Data Preparation for Relative Risk mapping (with regards to Cumulative Death Rate)

Preparing another dataframe Positive_Cases to join Death_Rate dataframe in the upcoming processes, to add the additional columns needed for relative risk mapping later on (E.g. POSITIF and Meninggal).

Positive_Cases <- aspatial_df %>%
    select("MonthYear", "nama_kelurahan", "POSITIF", "Meninggal")

Positive_Cases <- Positive_Cases[Positive_Cases$MonthYear == "2021-07-31", ]

Positive_Cases dataframe will now be integrated within Death_Rate dataframe.

Death_Rate <- left_join(Positive_Cases, Death_Rate, by = c(nama_kelurahan = "nama_kelurahan"))

Death_Rate <- subset(Death_Rate, select = -c(MonthYear))

3.3.4 Renaming the columns for both dataframes for easier identification during analysis

The columns will be renamed and simplified for easier identification, such as removing Date value from the MonthYear values (e.g. “2021-02-28” into “02-2021”).

colnames(Case_Rate) <- c("SUB_DISTRICT", "03-2020", "04-2020", "05-2020", "06-2020",
    "07-2020", "08-2020", "09-2020", "10-2020", "11-2020", "12-2020", "01-2021",
    "02-2021", "03-2021", "04-2021", "05-2021", "06-2021", "07-2021")

colnames(Death_Rate) <- c("SUB_DISTRICT", "POSITIVE_CASES", "DEATHS", "03-2020",
    "04-2020", "05-2020", "06-2020", "07-2020", "08-2020", "09-2020", "10-2020",
    "11-2020", "12-2020", "01-2021", "02-2021", "03-2021", "04-2021", "05-2021",
    "06-2021", "07-2021")

3.3.5 Additional data cleaning to ensure cleaned data is used for analysis

A code to quickly check and ensure there are no NA values.

Case_Rate[rowSums(is.na(Case_Rate)) != 0, ]

Death_Rate dataframe is found to have NA rows, therefore we deleted them using na.omit() function.

Death_Rate[rowSums(is.na(Death_Rate)) != 0, ]

Death_Rate <- na.omit(Death_Rate)

3.3.6 Exporting .rds files and using them for efficient data storage

Rds files are used as a form of best practice, so as to lessen the amount of data storage for the required data.

Case_Rate_rds <- write_rds(Case_Rate, "data/the1data/rds/Case_Rate_rds.rds")
Death_Rate_rds <- write_rds(Death_Rate, "data/the1data/rds/Death_Rate_rds.rds")
Case_Rate_rds <- read_rds("data/the1data/rds/Case_Rate_rds.rds")
Death_Rate_rds <- read_rds("data/the1data/rds/Death_Rate_rds.rds")

4.Geospatial Analysis

4.1 Data Preparation for Thematic Mapping

4.1.1 Joining Geospatial data (jakarta_DGN95) with Aspatial data (Case Rate and Death Rate)

left_join of dplyr is used to join the geospatial data and attributes from the aspatial data using the names of sub-districts as a common identifier (“DESA_KELUR for jakarta_DGN95, and”SUB_DISTRICT" for Case/Death_Rate_rds)

Case_Rate_Final <- left_join(jakarta_DGN95, Case_Rate_rds, by = c(DESA_KELUR = "SUB_DISTRICT"))

Death_Rate_Final <- left_join(jakarta_DGN95, Death_Rate_rds, by = c(DESA_KELUR = "SUB_DISTRICT"))

4.1.2 Checking for NA rows in the combined dataframes

Once again, checking for NA rows to ensure substantial analysis results will be attained.

Case_Rate_Final[rowSums(is.na(Case_Rate_Final)) != 0, ]
Simple feature collection with 0 features and 26 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: DGN95 / Indonesia TM-3 zone 54.1
 [1] OBJECT_ID  KODE_DESA  DESA       KODE       PROVINSI   KAB_KOTA  
 [7] KECAMATAN  DESA_KELUR JUMLAH_PEN 03-2020    04-2020    05-2020   
[13] 06-2020    07-2020    08-2020    09-2020    10-2020    11-2020   
[19] 12-2020    01-2021    02-2021    03-2021    04-2021    05-2021   
[25] 06-2021    07-2021    geometry  
<0 rows> (or 0-length row.names)
Death_Rate_Final[rowSums(is.na(Death_Rate_Final)) != 0, ]
Simple feature collection with 0 features and 28 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: DGN95 / Indonesia TM-3 zone 54.1
 [1] OBJECT_ID      KODE_DESA      DESA           KODE          
 [5] PROVINSI       KAB_KOTA       KECAMATAN      DESA_KELUR    
 [9] JUMLAH_PEN     POSITIVE_CASES DEATHS         03-2020       
[13] 04-2020        05-2020        06-2020        07-2020       
[17] 08-2020        09-2020        10-2020        11-2020       
[21] 12-2020        01-2021        02-2021        03-2021       
[25] 04-2021        05-2021        06-2021        07-2021       
[29] geometry      
<0 rows> (or 0-length row.names)

No NA rows were found, so we are good to proceed on.

4.2 Thematic Mapping

4.2.1 Small multiples choropleth maps to see the distribution of COVID Cases and Deaths: March 2020 (First Month)

Plotting small multiples choropleth maps using KAB_KOTA field (Municipality Level, e.g. West Jakarta, North Jakarta etc). March 2020 is the first month that we have data on so we will be using March 2020 data to start this study.

tm_shape(Case_Rate_Final) + tm_fill("03-2020", style = "jenks", palette = "Blues",
    thres.poly = 0) + tm_facets(by = "KAB_KOTA", free.coords = TRUE, drop.shapes = TRUE) +
    tm_layout(legend.show = TRUE, main.title = "Distribution of cumulative confirmed Case\nRate per 10,000 Population (March 2020)",
        main.title.position = "left", title.size = 12) + tm_borders(alpha = 0.5)

From the plotted map, we can see that there is an even distribution throughout the different municipalities (no particular municipality with outlandish number of cases).

But Gelora in Tanah Abang, Jakarta Pusat, and the area surrounding Senayan in Kebayoran Baru, Jakarta Selatan seemingly has slightly more cases than the other sub-districts as shown by their darker blue shading.

tm_shape(Death_Rate_Final) + tm_fill("03-2020", style = "jenks", palette = "Greens",
    thres.poly = 0) + tm_facets(by = "KAB_KOTA", free.coords = TRUE, drop.shapes = TRUE) +
    tm_layout(legend.show = TRUE, main.title = "Distribution of cumulative Death Rate\nper 10,000 Population (March 2020)",
        main.title.position = "left", title.size = 12) + tm_borders(alpha = 0.5)

Similar to case rate, we can see that there is a seemingly even distribution throughout the different districts (no particular district with outlandish number of death cases).

However, the sub-districts of Gondangdia in Mentang, Jakarta Pusat and Karet Semanggi of Setiabudi, Jakarta Selatan has slightly more death cases than the other sub-districts.

4.2.2 Small multiples choropleth maps to see the distribution of COVID Cases and Deaths: July 2021 (Last Month)

July 2021 is the last month that we have data on, so we will be using it to see the change in distribution from the first to the last month.

tmap_mode("plot")

tm_shape(Case_Rate_Final) + tm_fill("07-2021", style = "jenks", palette = "Blues",
    thres.poly = 0) + tm_facets(by = "KAB_KOTA", free.coords = TRUE, drop.shapes = TRUE) +
    tm_layout(legend.show = TRUE, main.title = "Distribution of cumulative confirmed Case\nRate per 10,000 Population (July 2021)",
        main.title.position = "left", title.size = 12) + tm_borders(alpha = 0.5)

We won’t be without taking into account the difference in number of cases from March 2020 to July 2020, this visualisation simply serves as a mean for comparing the distribution of cases across Jakarta.

Similar to the first month, an even distribution is seen throughout the different districts.

However, there is a concentrated subdistrict Gambir in Gambir, Jakarta Pusat with a much higher number of cases than the other sub-districts as shown by the darker blue shading.

tm_shape(Death_Rate_Final) + tm_fill("07-2021", style = "jenks", palette = "Greens",
    thres.poly = 0) + tm_facets(by = "KAB_KOTA", free.coords = TRUE, drop.shapes = TRUE) +
    tm_layout(legend.show = TRUE, main.title = "Distribution of cumulative Death Rate\nper 10,000 Population (July 2021)",
        main.title.position = "left", title.size = 12) + tm_borders(alpha = 0.5)

Similar to case rate, we can see that there is a seemingly even distribution throughout the different sub-districts of Jakarta.

However, the subdistrict of Gambir in Gambir, Jakarta Pusat is an outlier as well, with a much higher number of death cases than the other sub-districts as shown by the darker green shading.

P.S. As mentioned earlier on, these visualisations does not take into account the difference in number of cases from March 2020 to July 2020, we are only comparing the change in distribution of cases across Jakarta.

4.2.3 Time-series choropleth map of Positive Cases Rate throughout March 2020 to July 2021

To take into account the increase in total cases, time-series choropleth maps will be used to do this analysis.

Firstly, summary function is used to find the minimum and highest value from Case_Rate_Final “07-2021” (AKA column with the most updated number of cumulative positive cases), so as to determine the appropriate category breakpoints.

summary(Case_Rate_Final$`07-2021`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  187.3   545.1   658.3   705.3   779.7  3808.3 

With reference to the results above, the breakpoints for our breaks vector will be c(0, 500, 1000, 1500, 2000, 2500, 3000).

We will plot small multiple maps to see how the distribution of COVID cases has changed on a monthly basis from March 2020 to July 2021.

tm_shape(Case_Rate_Final) + tm_polygons(c("03-2020", "04-2020", "05-2020", "06-2020",
    "07-2020", "08-2020", "09-2020", "10-2020", "11-2020", "12-2020", "01-2021",
    "02-2021", "03-2021", "04-2021", "05-2021", "06-2021", "07-2021"), breaks = c(0,
    500, 1000, 1500, 2000, 2500, 3000), palette = "Reds") + tm_layout(panel.show = TRUE,
    panel.labels = c("03-2020", "04-2020", "05-2020", "06-2020", "07-2020", "08-2020",
        "09-2020", "10-2020", "11-2020", "12-2020", "01-2021", "02-2021", "03-2021",
        "04-2021", "05-2021", "06-2021", "07-2021"), panel.label.size = 3, asp = NA,
    legend.show = FALSE)

Similar to the above visualisation, but now we will look at an estimated 5-month change in time period (Splitting the 17 months into 4 parts, which equals ~5) to see how the distribution of COVID cases has changed from March 2020 to July 2021.

tm_shape(Case_Rate_Final) + tm_polygons(c("03-2020", "08-2020", "02-2021", "07-2021"),
    breaks = c(0, 500, 1000, 1500, 2000, 2500, 3000), palette = "Reds") + tm_layout(panel.show = TRUE,
    panel.labels = c("03-2020", "08-2020", "02-2021", "07-2021"), panel.label.size = 3,
    asp = NA, legend.show = FALSE)

4.2.4 Time-series choropleth map of Death Rate throughout March 2020 to July 2021

To find the minimum and highest value from Death_Rate_Final “07-2021” (AKA column with the most updated number of cumulative death cases).

summary(Death_Rate_Final$`07-2021`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.439   8.446  10.298  10.602  12.508  42.098 

Plotting small multiple maps to see how the distribution of deaths has changed on a monthly basis from March 2020 to July 2021.

tm_shape(Death_Rate_Final) + tm_polygons(c("03-2020", "04-2020", "05-2020", "06-2020",
    "07-2020", "08-2020", "09-2020", "10-2020", "11-2020", "12-2020", "01-2021",
    "02-2021", "03-2021", "04-2021", "05-2021", "06-2021", "07-2021"), breaks = c(0,
    5, 10, 15, 20, 25, 30, 35, 40), palette = "Purples") + tm_layout(panel.show = TRUE,
    panel.labels = c("03-2020", "04-2020", "05-2020", "06-2020", "07-2020", "08-2020",
        "09-2020", "10-2020", "11-2020", "12-2020", "01-2021", "02-2021", "03-2021",
        "04-2021", "05-2021", "06-2021", "07-2021"), panel.label.size = 3, asp = NA,
    legend.show = FALSE)

Similar to the above visualisation, but with an estimated 5-month change in time period (Splitting the 17 months into 4 parts, which equals ~5) to see how the distribution of deaths from COVID has changed from March 2020 to July 2021.

tm_shape(Death_Rate_Final) + tm_polygons(c("03-2020", "08-2020", "02-2021", "07-2021"),
    breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40), palette = "Purples") + tm_layout(panel.show = TRUE,
    panel.labels = c("03-2020", "08-2020", "02-2021", "07-2021"), panel.label.size = 3,
    legend.show = FALSE)

NORMAL MAINLAND MAP

tmap_mode("view")

tm_shape(Death_Rate_Final) + tm_fill("03-2020", breaks = c(0, 4, 13, 53, 119, 270,
    395, 3808), palette = "Reds", title = "Cumulative Case Rate") + tm_layout(main.title = "Cumulative Confirmed Case Rate in March 2020",
    main.title.position = "center", main.title.size = 1.2, legend.outside = TRUE,
    legend.height = 0.45, legend.width = 0.35, frame = TRUE) + tm_borders(alpha = 0.5) +
    tm_compass(type = "8star", size = 2, position = c("left", "bottom")) + tm_scale_bar(position = c("left",
    "bottom")) + tm_grid(alpha = 0.2)

THEMATIC MAPPING

PERCENTILE MAP????

percent <- c(0, 0.01, 0.1, 0.5, 0.9, 0.99, 1)
var <- Case_Rate_Final["07-2021"] %>%
    st_set_geometry(NULL)
quantile(var[, 1], percent)
       0%        1%       10%       50%       90%       99%      100% 
 187.3189  286.2999  458.1759  658.3395  939.6568 1966.0003 3808.2902 
get.var <- function(vname, df) {
    v <- df[vname] %>%
        st_set_geometry(NULL)
    v <- unname(v[, 1])
    return(v)
}
percent <- c(0, 0.01, 0.1, 0.5, 0.9, 0.99, 1)
var <- get.var("07-2021", Case_Rate_Final)
bperc <- quantile(var, percent)
tm_shape(jakarta_DGN95) + tm_polygons() + tm_shape(Case_Rate_Final) + tm_fill("07-2021",
    title = "07-2021", breaks = bperc, palette = "Blues", labels = c("< 1%", "1% - 10%",
        "10% - 50%", "50% - 90%", "90% - 99%", "> 99%")) + tm_borders() + tm_layout(title = "Percentile Map",
    title.position = c("right", "bottom"))
percentmap <- function(vnam, df, legtitle = NA, mtitle = "Percentile Map") {
    percent <- c(0, 0.01, 0.1, 0.5, 0.9, 0.99, 1)
    var <- get.var(vnam, df)
    bperc <- quantile(var, percent)
    tm_shape(jakarta_DGN95) + tm_polygons() + tm_shape(df) + tm_fill(vnam, title = legtitle,
        breaks = bperc, palette = "Blues", labels = c("< 1%", "1% - 10%", "10% - 50%",
            "50% - 90%", "90% - 99%", "> 99%")) + tm_borders() + tm_layout(main.title = "Cumulative Confirmed Case Rate in July 2021",
        main.title.position = "center", main.title.size = 1.2, legend.outside = TRUE,
        legend.height = 0.45, legend.width = 0.35, frame = TRUE) + tm_compass(type = "8star",
        size = 2, position = c("left", "bottom")) + tm_scale_bar(position = c("left",
        "bottom"))
}
CR_July2021_percentmap <- percentmap("07-2021", Case_Rate_Final)
CR_March2020_percentmap <- percentmap("03-2020", Case_Rate_Final)
CR_March2020_percentmap

BOXBREAKS BOX MAP????

boxbreaks <- function(v, mult = 1.5) {
    qv <- unname(quantile(v))
    iqr <- qv[4] - qv[2]
    upfence <- qv[4] + mult * iqr
    lofence <- qv[2] - mult * iqr
    # initialize break points vector
    bb <- vector(mode = "numeric", length = 7)
    # logic for lower and upper fences no lower outliers
    if (lofence < qv[1]) {
        bb[1] <- lofence
        bb[2] <- floor(qv[1])
    } else {
        bb[2] <- lofence
        bb[1] <- qv[1]
    }
    if (upfence > qv[5]) {
        # no upper outliers
        bb[7] <- upfence
        bb[6] <- ceiling(qv[5])
    } else {
        bb[6] <- upfence
        bb[7] <- qv[5]
    }
    bb[3:5] <- qv[2:4]
    return(bb)
}
get.var <- function(vname, df) {
    v <- df[vname] %>%
        st_set_geometry(NULL)
    v <- unname(v[, 1])
    return(v)
}
July2021_No_Zero <- Case_Rate_Final %>%
    filter("07-2021" >= 0)
var <- get.var("07-2021", Case_Rate_Final)
boxbreaks(var)
[1]  187.3189  193.3497  545.1432  658.3395  779.6722 1131.4658
[7] 3808.2902
boxmap <- function(vnam, df, legtitle = NA, mtitle = "Box Map", mult = 1.5) {
    var <- get.var(vnam, df)
    bb <- boxbreaks(var)
    tm_shape(df) + tm_fill(vnam, title = legtitle, breaks = bb, palette = "Blues",
        labels = c("lower outlier", "< 25%", "25% - 50%", "50% - 75%", "> 75%", "upper outlier")) +
        tm_borders() + tm_layout(main.title = "Cumulative Confirmed Case Rate in July 2021",
        main.title.position = "center", main.title.size = 1.2, legend.outside = TRUE,
        legend.height = 0.45, legend.width = 0.35, frame = TRUE) + tm_compass(type = "8star",
        size = 2, position = c("left", "bottom")) + tm_scale_bar(position = c("left",
        "bottom"))
}
boxmap("07-2021", Case_Rate_Final)

Relative Risk, Either No. of Cases or Population

Death_Rate_Final <- Death_Rate_Final %>%
    mutate(EXPECTED_DEATH = (POSITIVE_CASES * `07-2021`)/10000)

Death_Rate_Final <- Death_Rate_Final %>%
    mutate(SMR = DEATHS/EXPECTED_DEATH)

Data Prepartion for Relative Risk Mapping

# Jakarta_Death_Final <- Jakarta_Death_Final %>%
# group_by('Expected_Death',Sub_District) %>% dplyr::summarise(`SMR` =
# (Death)/(Expected_Death)) %>% ungroup() Jakarta_Death_Final <-
# Jakarta_Death_Final %>% group_by(Sub_District,SMR) %>%
# dplyr::summarise(`SMR_Avg` = sum(SMR)/15)